Implementation of wide multiplexers in reconfigurable logic

ABSTRACT

A reconfigurable processing device comprising one or more reconfigurable processing units is disclosed. At least a processing unit includes a computational unit having a preprocessing module for receiving n input signals, and s1 selection signals, and providing k output signals wherein k&lt;n+s1. The computational unit further comprises a m-output look-up table being addressed by the k output signals of the preprocessing module and an output multiplexer for selecting one of the m output signals of the look-up table under control of s2 further selection signals. This allows for the implementation of relatively large multiplexers also in architectures using multi-bit output LUTs. In addition a reconfigurable processing unit is described having an input multiplexer for selecting input signal from a communication network, which input multiplexer is configurable statically or dynamically.

The present invention relates to a reconfigurable processing device. Reconfigurable logic (RL) is used to implement functions which are unknown at the design time. To enable this reconfigurable logic architectures are made generic in the sense that various logic functions can be mapped onto them. Although this offers flexibility to a user, at the same time it leads to a large area overhead compared with the logic which would be required for a standard ASIC implementation, for example. Because of this, a basic computational element of traditional reconfigurable logic devices (such as field programmable gate arrays (FPGAs)), the so called logic cell, contains only a limited amount of logic resources, e.g. look-up tables. Moreover, the resources of a logic cell are chosen in such a way that they reflect the common requirements of different applications. However, if coarser or more specialised functions are to be mapped, usually either much more logic cells are required or such functions cannot be mapped at all.

Reconfigurable logic devices comprise data flow controlling elements such as multiplexers, demultiplexers, gates etc and dataflow manipulating elements, such as logical gates, adders and lookup tables. The latter allow implementation of various functions, which can be redefined easily by loading them with a different content. Reconfigurable logic devices further comprise configuration memory units (configuration memory) for storing settings of data flow controlling elements, such as multiplexers, demultiplexers, switches etc. In this way connections between different parts of the reconfigurable processing unit can be rapidly redefined. Control signals for the memory cells are generated by an address decoder in response to an address offered at the input thereof.

In particular multiplexers of different sizes and with different operand widths are heavily used in random logic as well as data-path applications. Despite that, in a large majority of traditional reconfigurable logic architectures, there is support for the implementation of small multiplexers only. This limitation has two reasons:

-   1. A specific type of a logic cell which hampers a multiplexer     implementation. -   2. A limited number of logic cell inputs which causes that a logic     cell cannot obtain a required number of signals from the routing     resources.

For applications, in which wide multiplexers are heavily used (e.g. DSP data-paths, cryptography, networking), this is an important limitation. Although, this is particularly an issue for architectures with multi-bit output LUTs (moLUTs), any current FPGAs, both fine- and coarse-grained, face this problem in some way.

It is an object of the invention to provide a reconfigurable processing device which allows for the implementation of relatively large multiplexers also in architectures using multi-bit output LUTs. In order to achieve this object the reconfigurable processing device is defined by claim 1.

In the reconfigurable processing device according to the invention, the preprocessing module reduces the number of signals, i.e. number n of input signals, and the number s1 of selection signals, to a smaller set of k output signals. The combination of the preprocessing module and the look-up table now can handle a relatively large number of input signals as compared to the look-up table alone.

Preferably the look-up table is implemented according to the definition of claim 2. This has the advantage that the decoder can be used both for writing and reading the table. The storage unit could also be used for temporarily storing data which is calculated in the reconfigurable processing device.

A practical embodiment is described in claim 3. In this embodiment the preprocessing unit can have a relatively small number of gates.

The embodiment of claim 4 introduces additional flexibility to the reconfigurable device, in that it renders it possible to configure the computational unit either as a multiplexer, or as a general look-up table.

Claim 5 provides an efficient way to implement an even wider multiplexer.

Claim 6 claims a computational unit provided with an an input multiplexer for selecting signals available at a communication network. In particular for this purpose wide input multiplexers are important to enable a high degree of flexibility in coupling different reconfigurable processing units of a reconfigurable processing device to each other. The input multiplexer can be configured either statically or dynamically. In the statical case the selection made by the input multiplexer is determined by the values stored in the configuration memory. In the dynamical case the selection made by the input multiplexer is determined during runtime by signals available at the comunication network. In case of a computational unit with a plurality of input pins, each of the pins or a part thereof may be coupled to a multiplexer in this way.

These and other aspects of the invention are described in more detail with respect to the drawing. Therein:

FIG. 1 schematically shows a reconfigurable processing device,

FIG. 2 shows in more detail the coupling between a reconfigurable processing unit and a communication network,

FIG. 3 shows a conventional element used in a reconfigurable processing device,

FIGS. 4A, 4B and 4C shows three conventional approaches for implementing multiplexers in prior art reconfigurable devices,

FIG. 5 shows an embodiment according to the invention,

FIG. 6 shows an implementation of a preprocessing unit,

FIGS. 7A and 7B show two alternative embodiments for the preprocessing unit,

FIG. 8 shows a further embodiment of the invention,

FIG. 9 shows a computational unit having an input multiplexer which is configurable statically or dynamically.

FIG. 1 shows a reconfigurable processing device 100 comprising one or more reconfigurable processing units 1. The units 1 may communicate to each other via a communication network comprising horizontal buses 90 _(H) and vertical buses 90 _(V). The units 1 are connected to configuration bus CB, comprising an addres bus and a data bus and control signals for allowing configuration data to be loaded in a configuration memory which controls the functioning of the units 1. In the embodiment shown the reconfigurable processing device 100 comprises a first 60 and a second global decoder 70 which respectively activate a first control line, e.g. 61 a and a second control line e.g. 71 b. The reconfigurable processing unit, here 1 ab selected by the control lines is then reconfigured with the data from the configuration bus CB. In another embodiment the reconfigurable processing device 100 may only have local addres decoders. In again another embodiment the reconfigurable processing device has no decoder at all, for example in an embodiment wherein the configuration storage units are arranged in a chain. In that case configuration takes place by subsequentially shifting the configuration data in the chain.

For many purposes it is necessary to select input signals for a reconfigurable processing unit 1 from a plurality of signals available in the communication network. FIG. 2 shows an example wherein the multiplexer 13 selects an input signal from the signals available at a horizontal bus 90 _(H) of the communication network.

Reconfigurable logic architectures with multi-bit output LUTs (moLUTs), have shown to offer multi-functionality at a reduced implementation cost. The most common way of implementing moLUTs is similar to traditional SRAMs, this is by using one decoder for addressing several (LUT) memory columns. A 4-input LUT is found to be the most area-efficient for random logic implementation. For that reason, a 4-input LUT is typically used in the implementation of the moLULT-based devices. The LUT type determines the total number of logic cell input pins. Thus, a device with a 4-input moLUT will have four pins. If an additional 2:1 multiplexer is placed at the LUT outputs, it will result in maximally five pins (see FIG. 3). This is not enough even to implement a 4:1 multiplexer, which requires six inputs (four primary and two selection inputs). Thus, typically only very small multiplexers can be implemented in such devices.

FIG. 4A, 4B, 4C show some prior art implementations of a multiplexer. FIG. 4A shows a first approach, in which the multiplexer is implemented by dedicated circuitry. In such architectures the multiplexer function is implemented by programming connections in a fixed topology of logic gates as well as by the selection of right operands for them (e.g. constants).

The second approach, shown in FIG. 4B, is characteristic for the multiplexer-based devices as designed by Actel. In such devices, a logic function is implemented by programming multiplexer inputs in a way required by the mapped function. The multiplexer-based reconfigurable logic devices are of a fine granularity and contain a small set of 2:1 multiplexers (usually three). Thus, maximally a 4:1 multiplexer can be implemented in such a structure.

The third type of implementation as shown in FIG. 4C assumes the use of look-up tables (LLTs), which is typical for most of the current FPGAs, e.g from Atmel and Xilinx.

According to this approach, look-up tables in a logic cell implement single multiplexers of the limited size (typically a 4:1 multiplexer), while wider multiplexers are created by using additional 2:1 multiplexers present at the look-up table outputs.

FIG. 5 shows a computational unit in an embodiment of a reconfigurable processing device according to the invention. The computational unit 10 comprises a preprocessing module 11 for receiving n=4 input signals x0, x1, x2, x3, and s1=1 selection signals, signal c1 and providing k=4 output signals y0, y1, y2, y3, wherein k<n+s1. It further comprises a m=2-output look-up table 12, 13 a, 13 b, being addressed by the k output signals of the preprocessing module, and an output multiplexer 14 for selecting one of the m=2 output signals of the look-up table 12, 13 a, 13 b under control of s2=1 further selection signals as the output signal F.

The basic idea behind the method proposed here is based on the decomposition of the multiplexer function in such a way, that it can be mapped onto the multi-bit output LUT which has been enhanced with a small amount of extra logic. By way of example, this is shown for the implementation of a 4:1 multiplexer on the multi-bit output LUT with 4-inputs and 2-outputs (4/2-LUT).

A 4:1 multiplexer can be described by a logic function F of six variables: x₀,x₁,x₂,x₃, being inputs of the multiplexer and c₀,c₁ being control (selection) signals, as shown by the following equation: F(x ₀ ,x ₁ ,x ₂ ,x ₃ ,c ₀ ,c ₁)={overscore (c ₀)},{overscore (c ₁)},x ₀+c₀,{overscore (c ₁)},x ₁+{overscore (c ₀)},c ₁x₂+c₀,c₁x₃  (1) This equation can be further modified to the form: F(x ₀ ,x ₁ ,x ₂ ,x ₃ ,c ₀ ,c ₁)={overscore (c ₀)}.A+c ₀ .B,  (2) wherein: A={overscore (c ₁)}.x ₀ +c ₁ x ₂,  (3a) and B={overscore (c ₁)}.x ₁+c₁.x₃  (3b) Eqn 2 describes a 2:1 multiplexer with inputs A, B, and c₀ being a control signal. This multiplexer can be mapped onto the 2:1 multiplexer which is present at the outputs of the 4/2-LUT. Such a mapping is possible only if both functions A and B can be encoded in the memories (memory columns) of the moLUT. Functions A and B require in total five different logic variables while the given 4/2-LUT has only four inputs. However, all partial products of the A and B functions share the same logic variable c₁. If it is assumed that these partial products.{overscore (c₁)}.x₀,{overscore (c₁)}, c₁x₂,c₁,x₃ are generated outside the LUT, than the partial product results y₀, y₁, y₂, y₃ can be treated as inputs of this LUT. Thus, a 4/2-LUT implements functions A and B of the form as in Eqn. 4 A=y ₀ +y ₂,  (3b) and B=y ₁ +y ₃  (4b)

The only modification required with respect to the standard 4/2-LUT implementation is a relatively small preprocessing unit 11.

The above described principle is not limited to the implementation of 4:2 multiplexers. Analogously any selection function F: F(x ₀ , . . . , x _(k) , c ₀ , . . . , c _(L))=c ₀ . . . c _(L) .x ₀ + . . . +c ₀ .c _(L) .x _(k)  (5) for selecting an output signal from input signals x₀, . . . x_(k) may be rewritten in the form F(x ₀ , . . . , x _(k) , c ₀ , . . . , c _(L))={overscore (c ₀)}.A +c ₀ .B,  (6) where A is a function of the variables x₀, x₂, . . . , x_(2i) and c₁, . . . . c_(L), and B is a function of the variables x₁, x₃, . . . x_(2i+1), and c₁, . . . c_(L). The function A on its turn can be rewritten as the logical OR of the variables y₀, y₂, y_(2i), wherein yi is a function which only depends on the variable xi and the selection variables c₁, . . . c_(L).

Likewise the function B can be rewritten as the logical OR of the variables y₁, y₃, y_(2i+1), wherein yi is a function which only depends on the variable xi and the selection variables c₁, . . . c_(L). Again, the functions y_(i) can be calculated with a preprocessing unit, while the resulting function F is calculated from the values of yi and the remaining selection variables c₁, . . . c_(L).

The preprocessing unit for generating the values y_(i) is characterized in that the number n of input signals x_(i) is equal to the number k of output signals y_(i) and that the preprocessing unit has a first input for receiving a first selection signal c₁, as well as further inputs for receiving the further selection signals c₂, . . . c_(L). The preprocessing unit generates in response to each of the input signals x_(i) an output signal y_(i). For one half of the output signals y_(i) the value is the logical AND function of the corresponding input signal xi and the first selection signal c₁ and for another half of the output signals y_(i) the value is the logical AND function of the corresponding input signal x₁ and the inverse of the first selection signal c₁.

By way of example FIG. 6 shows a preprocessing unit 11 in a practical embodiment of the invention. The preprocessing unit comprises 4 logical AND gates 11 a, 11 b, 11 c, 11 d, and an inverter 11 e to calculate the values y₀, . . . y₃. The values y₀ and y₁ are realized by the logical AND function of their corresponding input signals x₀, x₁ and the inverse of the first selection signal c₁. The values y₂ and y₃ are realized by the logical AND function of their corresponding input signals x₂, x₃ and first selection signal c₁ itself.

Preferably the preprocessing module comprises a mode select input for selecting a further operational mode, in which further operational mode the output signals y_(i) are identical to their corresponding input signals x_(i). If a logic function of only four inputs is to be implemented in the modified 4/2-LUT, than the input processing block can be either bypassed FIG. 7A or it can be used in such a way that the primary logic inputs are passed through it without any conversion. In FIG. 7A parts being referred to by reference numbers having a quote (′) correspond to parts having the same reference number in FIG. 6. In FIG. 7B parts referred to by reference numbers having a double quote correspond to parts having a single quote in FIG. 7A.

In the embodiment shown in FIG. 7A each of the AND gates 11 a′, . . . 11 d′ is coupled to an output via a respective auxiliary multiplexer 15 a′, . . . 15 d′. The multiplexers are coupled via a control input 17′ of the preprocessing unit 11′ to an output of the configuration memory 30′.

In the embodiment of FIG. 7B the signal of the inverter 11 e″ is bypassed in the further operational mode. To that end the auxiliary multiplexers 16 a″ and 16 c″ are coupled to control input 17″ which is coupled to configuration memory 30″. In the further operational mode the auxiliary multiplexers select the logical value 1 as their output signals. The result is that each of the output signals y_(i) is equal to x_(i). An additional advantage is that the auxiliary multiplexers 16 a″ and 16 c″ do not delay the signals y₀, . . . y₃.

FIG. 8 shows a further embodiment of a reconfigurable processing device according to the invention claim 1. It comprises an enlarged computational unit having a first 10 a and a second computational unit 10 b. Each of the computational units 10 a, 10 b comprises a preprocessing module, a m-output look-up table, and an output multiplexer. In the embodiment shown the computational units 10 a, 10 b are identical to the one shown in FIG. 5. The enlarged computational unit shown in FIG. 8 further comprises a further multiplexer 18 for selecting an output signal of either the first 10 a or the second unit 10 b as its output signal F in response to a further selection signal c2.

FIG. 9 shows a reconfigurable processing device wherein the computational unit 10 is arranged as an input multiplexer for a reconfigurable logic unit 1. The computational unit 10 selects an input signal for the reconfigurable logic unit 1 from signals available at a communication network 90 _(H), 90 _(V). Although in the embodiment of FIG. 9 the reconfigurable logic unit 1 has only one input, it may have a plurality of inputs. Each, or a paret of those inputs may be coupled to a computational unit 10 which is arranged as an input multiplexer.

In the embodiment shown in FIG. 9 the selection signals for the computational unit 10 are provided by auxiliary computational units 20, 21, 22. Each auxiliary computational unit 20, 21, 22 selects either a signal from a configuration memory M2, M3 and M4 respectively or a signal from the communication network 90 _(V) as its input signal. This selection is made in response to an auxiliary selection signal provided by the configuration memory M1.

It is remarked that the scope of protection of the invention is not restricted to the embodiments described herein. Neither is the scope of protection of the invention restricted by the reference numerals in the claims. The word ‘comprising’ does not exclude other parts than those mentioned in a claim. The word ‘a(n)’ preceding an element does not exclude a plurality of those elements. Means forming part of the invention may both be implemented in the form of dedicated hardware or in the form of a programmed general purpose processor. The invention resides in each new feature or combination of features. 

1. Reconfigurable processing device (100) comprising one or more reconfigurable processing units (1) including a:computational unit (10) having a preprocessing module (11) for receiving n input signals (x₀, . . . x₃), and s1 selection signals (c₁), and providing k output signals (y₀, . . . y₁) wherein k<n+s1, a m-output look-up table (12, 13 a, 13 b) being addressed by the k output signals of the preprocessing module (11), an output multiplexer (14) for selecting one of the m output signals of the look-up table (12, 13 a, 13 b) under control of s2 further selection signals (c₀).
 2. Reconfigurable processing device according to claim 1, characterized in that the look-up table includes a k:2^(k) decoder (12) for decoding k address signals (y₀, . . . y₃) into 2^(k) control signals and a storage unit (13 a, 13 b) comprising m columns of 2^(k) storage elements each.
 3. Reconfigurable processing device according to claim 1, characterized in that n=k and that the preprocessing unit (11) has a first input for receiving a first selection signal (c1), wherein the preprocessing unit (11) in an operational mode generates in response to each of the input signals (x_(i)) an output signal (y_(i)) wherein for one half of the output signals (y₂, y₃) the value is the logical AND function of the corresponding input signal (x₂, x₃ resp.) and the first selection signal (c₁) and for another half of the output signals (y₀, y₁) the value is the logical AND function of the corresponding input signal (x₀, x₁ resp.) and the inverse of the first selection signal (c₁).
 4. Reconfigurable processing device according to claim 3, characterized in that the preprocessing unit (11′;11″) comprises a mode select input (17′, 17″) for selecting a further operational mode, in which further operational mode the output signals (y_(i)) are identical to their corresponding input signals (x_(i)).
 5. Reconfigurable processing device according to claim 1, characterized in that it comprises an enlarged computational unit having a first and a second computational unit (10 a, 10 b) comprising each a preprocessing module (11), a m-output look-up table (12, 13 a, 13 b), and an output multiplexer (14), the enlarged computational unit further comprising a further multiplexer (18) for selecting an output signal of either the first or the second unit (10 a, 10 b) as its output signal (F) in response to a further selection signal (c₂).
 6. Reconfigurable processing device comprising one or more computational units (1) with at least one input terminal which is coupled via an input multiplexer (10) to a communication network (90 _(V), 90 _(H) 90 _(H)), characterized in that selection signals for the input multiplexer of the computational unit (1) are provided by auxiliary selection units (20, 21, 22), each auxiliary selection unit selecting either a signal from a configuration memory (M2, M2, M3) or from the communication network (90 _(V)) as its input signal, in response to an auxiliary selection signal (M1) provided by the configuration memory. 